{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5_27Lz77SEy5"
   },
   "source": [
    "# Best Subset Selection: L0-Regression\n",
    "\n",
    "## Objective and Prerequisites\n",
    "\n",
    "In this example, you will learn how to perform linear regression with feature selection using mathematical programming. We’ll show you how to construct a mixed-integer quadratic programming (MIQP) model of this linear regression problem, implement this model in the Gurobi Python API, and generate an optimal solution.\n",
    "\n",
    "This modeling example is at the intermediate level, where we assume that you know Python and are familiar with the Gurobi Python API. In addition, you should have some knowledge about building mathematical optimization models.\n",
    "\n",
    "**Download the Repository** <br />\n",
    "You can download the repository containing this and other examples by clicking [here](https://github.com/Gurobi/modeling-examples/archive/master.zip). \n",
    "\n",
    "---\n",
    "## Motivation\n",
    "\n",
    "Linear regression was invented at the beginning of the 19th century and today, after more than 200 years, it is still used extensively in practical applications for description and prediction purposes:\n",
    "\n",
    "- In econometrics, it is useful to estimate the price elasticity of a particular product by regressing sales revenue on price and possibly other features such as demographics and competitor and retail information.\n",
    "- In health sciences, it can be applied to predict how long a patient will remain (i.e. length of stay) in the ER of a hospital based on patient information, triage assessment, medical test results, and the date/time of arrival.\n",
    "- In social sciences, it may shed light on future academic performance of students, so proactive measures can be taken to improve their learning outcomes.\n",
    "\n",
    "In general, linear regression is used to model the relationship between a continuous variable and other explanatory variables, which can be either continuous or categorical. When applying this technique, finding the subset of features that maximizes its perfomance is often of interest.\n",
    "\n",
    "---\n",
    "## Problem Description\n",
    "\n",
    "Linear regression is a supervised learning algorithm used to predict a quantitative response. It assumes that there is a linear relationship between the feature vector $x_i \\in \\mathbb{R}^d$ and the response $y_i \\in \\mathbb{R}$. Mathematically speaking, for sample $i$ we have $y_i = \\beta^T x_i + \\epsilon_i$, where $\\beta \\in \\mathbb{R}^d$ is the vector of feature weights, including the intercept, and  $\\epsilon_i$ is a normally-distributed random variable with zero mean and constant variance representing the error term. We can learn the weights from a training dataset with $n$ observations $\\{X \\in \\mathbb{M}^{nxd},y \\in \\mathbb{R}^n\\}$ by minimizing the Residual Sum of Squares (RSS): $e^Te =(y-X\\beta)^T (y-X\\beta)=\\beta^T X^T X\\beta- 2y^TX\\beta+y^T y$. The Ordinary Least Squares (OLS) method achieves this by taking the derivative of this quadratic and convex function and then finding the stationary point: $\\beta_{OLS}=(X^T X)^{-1} X^T y$.\n",
    "\n",
    "In practice, some of the features are in fact not associated with the response. By including them, we only add unnecessary complexity to the model and increase variance to the weight estimates. However, finding the best performing model is no simple task as there is an exponential number of candidates, as one has to test $\\sum_{s=1}^{d-1}{{d-1} \\choose s}$ models. Since OLS rarely yields estimates that are exactly zero, thus discarding the features related to them, we need to resort to feature selection methods. Popular methods include:\n",
    "\n",
    "- Subset selection, e.g. stepwise selection.\n",
    "- Dimensionality reduction, e.g. principal component regression.\n",
    "- Shrinkage, e.g. the Lasso.\n",
    "\n",
    "The Lasso has undoubtedly been the method of choice for the last decade. Basically it fits a model containing all $d$ predictors, while incorporating a budget constraint based on the L1-norm of $\\beta$, disregarding the intercept component. In fact, this method minimizes the RSS, subject to $\\sum_{l=1}^{d-1}\\mathopen|\\beta_l\\mathclose| \\leq s$, where $s$ is a hyper-parameter representing the budget that is usually tuned via cross-validation. This constraint has the effect of shrinking all weight estimates, allowing some of them to be exactly zero when $s$ is small enough. Finally, it is worth noting that the unconstrained version of the Lasso is more frequently used. This version solves an unconstrained optimization problem where $RSS + \\lambda \\sum_{l=1}^{d-1}\\mathopen|\\beta_l\\mathclose|$ is minimized, for a given value of the —modified— lagrangian multiplier $\\lambda \\in \\mathbb{R}^+$.  \n",
    "\n",
    "A similar formulation is now presented, where the L0-norm is used instead (although it is not really a proper norm). We now seek to minimize the RSS, subject to $\\sum_{l=1}^{d-1}I(\\beta_l \\neq 0) \\leq s$, where $I(\\beta_l \\neq 0)$ is an indicator function taking on the value of 1 if $\\beta_j \\neq 0$ and 0 otherwise. In this setting, $s$ represents the number of features to consider in the model. This optimization problem may be casted as a Mixed Integer Quadratic Program (MIQP). Traditionally, the feature selection problem has not been tackled this way because of the common belief in the statistics community that large-scale problems are intractable. But this is no longer the case, considering the computing power currently available and the performance of modern optimization solvers such as Gurobi.\n",
    "\n",
    "---\n",
    "## Solution Approach\n",
    "\n",
    "Mathematical programming is a declarative approach where the modeler formulates a mathematical optimization model that captures the key aspects of a complex decision problem. The Gurobi Optimizer solves such models using state-of-the-art mathematics and computer science. \n",
    "\n",
    "A mathematical optimization model has five components, namely:\n",
    "\n",
    "- Sets and indices.\n",
    "- Parameters.\n",
    "- Decision variables.\n",
    "- Objective function(s).\n",
    "- Constraints.\n",
    "\n",
    "We now present a MIQP formulation that finds the weight estimates for a linear regression problem, where exactly $s$ of those weights can be nonzero:\n",
    "\n",
    "### Sets and Indices\n",
    "\n",
    "$l \\in L$: Set of features, where the intercept is excluded.\n",
    "\n",
    "### Parameters\n",
    "\n",
    "$s \\in \\mathbb{N}$: Number of features to include in the model, ignoring the intercept.\n",
    "\n",
    "### Decision Variables\n",
    "\n",
    "$\\beta \\in \\mathbb{R}^d$: Vector of feature weights, where the j-th element represents the change in the response variable per unit-change of feature $j$. Recall that the last component corresponds to the intercept.\n",
    "\n",
    "$\\text{norm}_0 \\in \\mathbb{R}$: Number of non-zero elements of vector $\\beta$, disregarding the intercept.\n",
    "\n",
    "### Objective Function\n",
    "\n",
    "- **Training error**: Minimize the Residual Sum of Squares (RSS):\n",
    "\n",
    "\\begin{equation}\n",
    "\\text{Min} \\quad Z = \\beta^T X^T X\\beta- 2y^TX\\beta+y^T y\n",
    "\\tag{0}\n",
    "\\end{equation}\n",
    "\n",
    "### Constraints\n",
    "\n",
    "- **L0-norm**: Compute the number of features with non-zero weights:\n",
    "\n",
    "\\begin{equation}\n",
    "\\text{norm}_0 = \\sum_{l \\in L}I(\\beta_l \\neq 0)\n",
    "\\tag{1}\n",
    "\\end{equation}\n",
    "\n",
    "- **Budget constraint**: Exactly $s$ feature weights can be non-zero:\n",
    "\n",
    "\\begin{equation}\n",
    "\\text{norm}_0 = s\n",
    "\\tag{2}\n",
    "\\end{equation}\n",
    "\n",
    "This model, by means of constraint 2, implicitly considers all ${{d-1} \\choose s}$ feature subsets at once. However, we also need to find the value for $s$ that maximizes the performance of the regression on unseen observations. Notice that the training RSS decreases monotonically as more features are considered (which translates to relaxing the MIQP), so it is not advisable to use it as the performance metric. Instead, we will estimate the Mean Squared Error (MSE) via cross-validation. This metric is defined as $\\text{MSE}=\\frac{1}{n}\\sum_{i=1}^{n}{(y_i-\\hat{y}_i)^2}$, where $y_i$ and $\\hat{y}_i$ are the observed and predicted values for the ith observation, respectively. Then, we will fine-tune $s$ using grid search, provided that the set of possible values is quite small.\n",
    "\n",
    "---\n",
    "## Python Implementation\n",
    "\n",
    "In the following implementation, we use four main libraries:\n",
    "\n",
    "- **Numpy** for scientific computing.\n",
    "- **Scikit learn** for machine learning algorithms.\n",
    "- **Gurobi** for mathematical optimization.\n",
    "- **Matplotlib** for visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "r7BAwEkmSEy_"
   },
   "outputs": [],
   "source": [
    "%pip install gurobipy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "cBbsv7ywSEzA"
   },
   "outputs": [],
   "source": [
    "# Let's import all the necessary libraries\n",
    "\n",
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "from itertools import product\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.pipeline import make_pipeline\n",
    "from sklearn.model_selection import cross_val_score\n",
    "from sklearn import linear_model\n",
    "from sklearn.metrics import mean_squared_error as mse\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "import gurobipy as gp\n",
    "from gurobipy import GRB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "aTc0wIyMSEzB"
   },
   "outputs": [],
   "source": [
    "# Create and deploy the optimization model\n",
    "\n",
    "# NOTE: This function assumes the design matrix features does not contain\n",
    "#a column for the intercept\n",
    "def miqp(features, response, non_zero, verbose=False):\n",
    "    \"\"\"\n",
    "    Deploy and optimize the MIQP formulation of L0-Regression.\n",
    "    \"\"\"\n",
    "    assert isinstance(non_zero, (int, np.integer))\n",
    "    # Create a Gurobi environment and a model object\n",
    "    with gp.Env() as env, gp.Model(\"\", env=env) as regressor:\n",
    "        samples, dim = features.shape\n",
    "        assert samples == response.shape[0]\n",
    "        assert non_zero <= dim\n",
    "\n",
    "        # Append a column of ones to the feature matrix to account for the y-intercept\n",
    "        X = np.concatenate([features, np.ones((samples, 1))], axis=1)  \n",
    "\n",
    "        # Decision variables\n",
    "        norm_0 = regressor.addVar(lb=non_zero, ub=non_zero, name=\"norm\")\n",
    "        beta = regressor.addMVar((dim + 1,), lb=-GRB.INFINITY, name=\"beta\") # Weights\n",
    "        intercept = beta[dim] # Last decision variable captures the y-intercept\n",
    "\n",
    "        regressor.setObjective(beta.T @ X.T @ X @ beta\n",
    "                               - 2*response.T @ X @ beta\n",
    "                               + np.dot(response, response), GRB.MINIMIZE)\n",
    "\n",
    "        # Budget constraint based on the L0-norm\n",
    "        regressor.addGenConstrNorm(norm_0, beta[:-1], which=0, name=\"budget\")\n",
    "\n",
    "        if not verbose:\n",
    "            regressor.params.OutputFlag = 0\n",
    "        regressor.params.timelimit = 60\n",
    "        regressor.params.mipgap = 0.001\n",
    "        regressor.optimize()\n",
    "\n",
    "        coeff = np.array([beta[i].X for i in range(dim)])\n",
    "        return intercept.X, coeff        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "E6dLWd6KSEzB"
   },
   "outputs": [],
   "source": [
    "# Define functions necessary to perform hyper-parameter tuning via cross-validation\n",
    "\n",
    "def split_folds(features, response, train_mask):\n",
    "    \"\"\"\n",
    "    Assign folds to either train or test partitions based on train_mask.\n",
    "    \"\"\"\n",
    "    xtrain = features[train_mask,:]\n",
    "    xtest = features[~train_mask,:]\n",
    "    ytrain = response[train_mask]\n",
    "    ytest = response[~train_mask]\n",
    "    return xtrain, xtest, ytrain, ytest\n",
    "\n",
    "def cross_validate(features, response, non_zero, folds, standardize, seed):\n",
    "    \"\"\"\n",
    "    Train an L0-Regression for each fold and report the cross-validated MSE.\n",
    "    \"\"\"\n",
    "    if seed is not None:\n",
    "        np.random.seed(seed)\n",
    "    samples, dim = features.shape\n",
    "    assert samples == response.shape[0]\n",
    "    fold_size = int(np.ceil(samples / folds))\n",
    "    # Randomly assign each sample to a fold\n",
    "    shuffled = np.random.choice(samples, samples, replace=False)\n",
    "    mse_cv = 0\n",
    "    # Exclude folds from training, one at a time, \n",
    "    #to get out-of-sample estimates of the MSE\n",
    "    for fold in range(folds):\n",
    "        idx = shuffled[fold * fold_size : min((fold + 1) * fold_size, samples)]\n",
    "        train_mask = np.ones(samples, dtype=bool)\n",
    "        train_mask[idx] = False\n",
    "        xtrain, xtest, ytrain, ytest = split_folds(features, response, train_mask)\n",
    "        if standardize:\n",
    "            scaler = StandardScaler()\n",
    "            scaler.fit(xtrain)\n",
    "            xtrain = scaler.transform(xtrain)\n",
    "            xtest = scaler.transform(xtest)\n",
    "        intercept, beta = miqp(xtrain, ytrain, non_zero)\n",
    "        ypred = np.dot(xtest, beta) + intercept\n",
    "        mse_cv += mse(ytest, ypred) / folds\n",
    "    # Report the average out-of-sample MSE\n",
    "    return mse_cv\n",
    "\n",
    "def L0_regression(features, response, folds=5, standardize=False, seed=None):\n",
    "    \"\"\"\n",
    "    Select the best L0-Regression model by performing grid search on the budget.\n",
    "    \"\"\"\n",
    "    dim = features.shape[1]\n",
    "    best_mse = np.inf\n",
    "    best = 0\n",
    "    # Grid search to find best number of features to consider\n",
    "    for i in range(1, dim + 1):\n",
    "        val = cross_validate(features, response, i, folds=folds,\n",
    "                             standardize=standardize, seed=seed)\n",
    "        if val < best_mse:\n",
    "            best_mse = val\n",
    "            best = i\n",
    "    if standardize:\n",
    "        scaler = StandardScaler()\n",
    "        scaler.fit(features)\n",
    "        features = scaler.transform(features)\n",
    "    intercept, beta = miqp(features, response, best)\n",
    "    return intercept, beta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5JXo-uw1SEzC"
   },
   "source": [
    "---\n",
    "## Benchmark\n",
    "\n",
    "We now compare the performance of the aforementioned approach w.r.t. OLS regression on all features and the Lasso. The Boston dataset is used for this purpose. This dataset measures the prices of 506 houses, along with 13 features that provide insights about their neighbourhoods. We will use the original feature terminology, so the interested reader can visit [this website](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) for more information.\n",
    "\n",
    "Note that 20% of the samples are reserved for computing the out-of-sample MSE. The resulting metrics are displayed in a bar chart (shown below) to facilitate the comparison between models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "N-rW7MFHSEzC"
   },
   "outputs": [],
   "source": [
    "# Define how the bar chart should be displayed\n",
    "\n",
    "def plot_bar_chart(performance):\n",
    "    \"\"\"\n",
    "    Display the performance of all three models in a bar chart.\n",
    "    \"\"\"\n",
    "    bar = plt.bar([1, 2, 3], performance, color=['r', 'g', 'y'],\n",
    "                  tick_label=['OLS', 'Lasso', 'L0-Regression'])\n",
    "    plt.title('Out-of-Sample MSE')\n",
    "    x1, x2, y1, y2 = plt.axis()\n",
    "    plt.axis((x1, x2, np.floor(np.min(performance)),\n",
    "              np.ceil(np.max(performance))))\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 469
    },
    "id": "GIw53gPCSEzD",
    "outputId": "7863c9e3-c861-4959-ba82-363101689433"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEICAYAAABGaK+TAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVyElEQVR4nO3de7SddX3n8fdHARlIBJUoiOGiCHS8ECR1qFS5FGnFKlYdkalcajXoWAUXl+WAWnSkrYwGqVNrcaEVTC0gKWgHRYZyWXSEMYmBAOFSKiCQQlCBUBW5fOeP5znDzuGcnH3IOQk/8n6ttRf7/J7f8+zv3pt89m//nstOVSFJas+z1ncBkqSnxgCXpEYZ4JLUKANckhplgEtSowxwSWqUAa5mJPlgknuSPJTkBeu7nvEkuS3J/uu7Dj3zGeAaSpIjkixL8osk/5bkr5NsOYn11yrUkmwMzAcOqKoZVfXTMfqckOTHfcDfmeTsp/p460OSfZJUkoWj2nfr2y8baDsoydIkDya5L8klSXbol52U5JH+dRi53b9On4zWCQNcE0pyDPBZ4DhgC2BPYHvg4iSbrKMyXgRsClw/To2HA4cC+1fVDGAucMk6qm0qrQReN+obxuHAzSN/JNkJOBM4hu792BH4EvD4wDpn9x90I7ctp71yrXMGuNYoyXOBTwEfrqrvVdUjVXUb8C66EH9P3+9vk3xmYL19ktzZ3z8L2A74Tj8aPH6cx3pOki8kubu/faFv2xm4qe92f5J/GmP13wQuqqpbAarq36rq9IFt/1GS5UlWJfnXJEeOrjXJ8UnuTbIiyduSHJjk5iQ/S3LCQP+Tknwrydn99pYk2W2c5/SsJB9LcmuSnyY5J8nz1/CS/xo4H3h3v/6z+9d6wUCfOcCPq+qS6qyqqvOq6o41bFfPQAa4JvI6upHval/rq+oh4LvAGyfaQFUdCtwBvKUfDZ4yTtcT6Ub3c4DdgNcCH6+qm4FX9H22rKr9xlj3KuCwJMclmdsH36B7gd8Hngv8EXBqktcMLN+6f57bAp8EvkL34bQH8Hrgk0leOtD/IOBc4PnA3wHn99M8o30EeBuwN/Bi4OfAX43z/EecCRzW3/9dum8ddw8sXwLsmuTUJPsmmTHB9vQMZYBrIlsB91XVo2MsW9Evnyp/CHy6qu6tqpV0I/9Dh1mxqr4BfJgu8C4H7k3ysYHl/6uqbu1HrJcD36cL5hGPACdX1SPA39M9r9P60e31dCH66oH+i6vqW33/+XThv+cYpR0JnFhVd1bVw8BJwDuTbLSG5/J/gOcn2YUuyM8ctfxfgX3oPmzOAe7rvwENBvm7ktw/cLt0vMdTuwxwTeQ+YKtxAmebfvmkJfnywA62kemJFwO3D3S7vW8bve52gzvoRtqrakFV7Q9sCXwA+HSS3+3XeVOSq/rpkPuBA1n9w+enVfVYf/+X/X/vGVj+S2AwIH8y8LiPA3eOVSvdNNM/jAQpsBx4jG5Of03OAv4E2Bf4h9ELq+qqqnpXVc2i+yB6A903mBHnVNWWA7d9J3g8NcgA10R+ADwMvH2wMcnmwJt4YkfhvwObDXTZetR2VrvsZVV9YGAH25/1zXfTBd6I7Vh96mBk3TsGd9CNsfyRqjoXuBZ4ZZLnAOcBnwNe1O/QuxDI+E97QrNH7iR5FvCSsWqlC/o3jQrTTavqrgm2fxbwX4ELq+oXa+pYVT+km+J65aSegZpngGuNquoBuqmMLyb5vSQb94ernUs36jyr77oUODDJ85NsDRw9alP3AC9lzb4JfDzJrCRb0c1Ff2OYOvvDHN+cZGa/4/BNdPPmVwObAM+hO8Lj0X7ZAcNsdw32SPL2/pvJ0XQfcleN0e/LwMlJtu/rnJXkoIk2XlU/pps3P3H0siS/neT9SV7Y/70r8NZxHl/PYAa4JtTvdDyBbgT7IF0o/gT4nX5eF7ogvwa4jW5+efQx2H9OF873Jzl2nIf6DLCIbuS8jG5n3WfG6Tvag32NdwD3A6cAH6yqK6tqFd3OxHPodiL+F+DbQ253PBcAB/fbOxR4ez8fPtpp/WN9P8kqupD9T8M8QF/7WKP6++kCe1k/hfQ9ummWwZ3DB486DvyhkcDXM0f8QQdpcpKcBOxUVe9Z37Vow+YIXJIaNWGAJ5md5NL+JIjrkxzVt++W5AfpTq/+Tn/ChyRpHZlwCiXJNsA2VbUkyUxgMd2JCV8Hjq2qy5O8F9ixqj4x3QVLkjoTjsCrakVVLenvr6I7jnVbYBfgir7bxcA7pqtISdKTjXs22Fj6w8d2pzsK4Tq6PeEXAP+ZgeNiR60zD5gHsPnmm++x6667rkW5krThWbx48X39SVurGfoolP403cvpTjde2B97+pfAC+gOk/pIVa3xGs1z586tRYsWTbp4SdqQJVlcVXNHtw81Au8v0nMesKCqFgJU1Y30J0P0V4t789SVK0mayDBHoQQ4A1heVfMH2kfOAnsW8HG6M84kSevIMMeB70V3ptl+6X4BZGmSA4FDktwM3Eh3DYivTWOdkqRRJpxCqaorGf+iP6dNbTmSpGF5JqYkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYN9av00lORT433S3xaW/Wntb5L0NOAI3BJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSoyYM8CSzk1yaZHmS65Mc1bfPSXJVkqVJFiV57fSXK0kaMcyZmI8Cx1TVkiQzgcVJLgZOAT5VVd9NcmD/9z7TV6okadCEAV5VK4AV/f1VSZYD2wIFPLfvtgVw93QVKUl6skldCyXJDsDuwNXA0cBFST5HNxXzuqkuTpI0vqF3YiaZAZwHHF1VDwIfBD5aVbOBjwJnjLPevH6OfNHKlSunomZJEkMGeJKN6cJ7QVUt7JsPB0bunwuMuROzqk6vqrlVNXfWrFlrW68kqTfMUSihG10vr6r5A4vuBvbu7+8H3DL15UmSxjPMHPhewKHAsiRL+7YTgPcDpyXZCPgVMG9aKpQkjWmYo1CuBMa7Mv8eU1uOJGlYnokpSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRm00UYcks4Ezga2Bx4HTq+q0JGcDu/TdtgTur6o501SnJGmUCQMceBQ4pqqWJJkJLE5ycVUdPNIhyeeBB6arSEnSk00Y4FW1AljR31+VZDmwLXADQJIA7wL2m8Y6JUmjTGoOPMkOwO7A1QPNrwfuqapbxllnXpJFSRatXLnyKRcqSVrd0AGeZAZwHnB0VT04sOgQ4JvjrVdVp1fV3KqaO2vWrKdeqSRpNcPMgZNkY7rwXlBVCwfaNwLeDuwxPeVJksYz4Qi8n+M+A1heVfNHLd4fuLGq7pyO4iRJ4xtmCmUv4FBgvyRL+9uB/bJ3s4bpE0nS9BnmKJQrgYyz7IipLkiSNBzPxJSkRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGjVhgCeZneTSJMuTXJ/kqIFlH05yU99+yvSWKkkatNEQfR4FjqmqJUlmAouTXAy8CDgIeHVVPZzkhdNZqCRpdRMGeFWtAFb091clWQ5sC7wf+Iuqerhfdu90FipJWt2k5sCT7ADsDlwN7Ay8PsnVSS5P8pvjrDMvyaIki1auXLnWBUuSOkMHeJIZwHnA0VX1IN3o/XnAnsBxwDlJMnq9qjq9quZW1dxZs2ZNUdmSpKECPMnGdOG9oKoW9s13Agur83+Bx4GtpqdMSdJowxyFEuAMYHlVzR9YdD6wX99nZ2AT4L5pqFGSNIZhjkLZCzgUWJZkad92AvBV4KtJrgN+DRxeVTUtVUqSnmSYo1CuBJ40t917z9SWI0kalmdiSlKjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjDAk8xOcmmS5UmuT3JU335SkruSLO1vB05/uZKkERsN0edR4JiqWpJkJrA4ycX9slOr6nPTV54kaTwTBnhVrQBW9PdXJVkObDvdhUmS1mxSc+BJdgB2B67um/4kybVJvprkeeOsMy/JoiSLVq5cuXbVSpL+v6EDPMkM4Dzg6Kp6EPhr4GXAHLoR+ufHWq+qTq+quVU1d9asWWtfsSQJGDLAk2xMF94LqmohQFXdU1WPVdXjwFeA105fmZKk0YY5CiXAGcDyqpo/0L7NQLc/AK6b+vIkSeMZ5iiUvYBDgWVJlvZtJwCHJJkDFHAbcOQ01CdJGscwR6FcCWSMRRdOfTmSpGF5JqYkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1KgJAzzJ7CSXJlme5PokR41afmySSrLV9JUpSRptoyH6PAocU1VLkswEFie5uKpuSDIbeCNwx7RWKUl6kglH4FW1oqqW9PdXAcuBbfvFpwLHAzVtFUqSxjSpOfAkOwC7A1cneStwV1VdMx2FSZLWbJgpFACSzADOA46mm1Y5EThgiPXmAfMAtttuu6dUpCTpyYYagSfZmC68F1TVQuBlwI7ANUluA14CLEmy9eh1q+r0qppbVXNnzZo1dZVL0gZuwhF4kgBnAMuraj5AVS0DXjjQ5zZgblXdN011SpJGGWYEvhdwKLBfkqX97cBprkuSNIEJR+BVdSWQCfrsMFUFSZKG45mYktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1Kihf5V+vcsafxRIa6NqfVcg6SlwBC5JjTLAJalRBrgkNaqdOXBJ0+6yy9zXNF322Wfq9zU5ApekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjDAk8xOcmmS5UmuT3JU3/7fk1ybZGmS7yd58fSXK0kaMcwI/FHgmKr6DWBP4ENJ/iPwP6rq1VU1B/hH4JPTV6YkabQJA7yqVlTVkv7+KmA5sG1VPTjQbXPAKyJJ0jqUmsSV6JLsAFwBvLKqHkxyMnAY8ACwb1WtHGOdecC8/s9dgJvWtuhGbAXct76L0NB8v9qzIb1n21fVrNGNQwd4khnA5cDJVbVw1LL/BmxaVX86FZU+EyRZVFVz13cdGo7vV3t8z4Y8CiXJxsB5wILR4d37O+AdU1mYJGnNhjkKJcAZwPKqmj/Q/vKBbm8Fbpz68iRJ4xnmaoR7AYcCy5Is7dtOAP44yS7A48DtwAempcJ2nb6+C9Ck+H61Z4N/zya1E1OS9PThmZiS1CgDXJIaZYA/BUlekuSCJLckuTXJaUk2SbJPkn8co//vJ/lRkmuS3JDkyPVR94YiyUPru4ZnsrFe3yTPSXJ2kn9JcnV/zshY656U5K7+Ehw3JDlk2gseUpIXJ/nW+q5jMgzwSeqPylkInF9VLwd2BmYAJ4/Tf2O6nS1vqardgN2By9ZNtdI688fAz6tqJ+BU4LNr6HtqfwmOg4C/6f+NrJUka/3zkFV1d1W9c223sy4Z4JO3H/CrqvoaQFU9BnwUeC+w2Rj9Z9Id7fPTvv/DVbWhnI36tJHkLf3I8EdJ/neSF/Xte/ejwaX9splJtklyRd92XZLX930PSbKsb1tTQG2IDgK+3t//FvA7/WBnXFV1C/AL4HkASY5L8sP+InmfGumX5BNJbkxycZJvJjm2b78syZ8luRw4KskeSS5PsjjJRUm26ft9pB/tX5vk7/u2sd73HZJc1y/fNMnX+vf7R0n27duPSLIwyff6b+CnTOWLOFn+qPHkvQJYPNjQX1bgDmCn0Z2r6mdJvg3cnuQSugt/fbOqHl8n1WrElcCeVVVJ3gccDxwDHAt8qKr+uT/b+Fd0l364qKpOTvJsYLP+apufBfYAfg58P8nbqur89fFknoa2BX4CUFWPJnkAeAFrONU9yWuAW6rq3iQHAC8HXgsE+HaSN9AF/DvovrluBCxh9X9/W1bV3v0o/nLgoKpameRgum/F7wU+BuxYVQ8n2bJfb6z3fdCH+ufyqiS70r3fO/fL5vT1PAzclOSLVfWTybxYU8UAn7ww9oW7xmunqt6X5FXA/nT/47wROGK6CtSYXgKc3Y/KNgF+3Lf/MzA/yQJgYVXdmeSHwFf7UDi/qpYm2Q+4bOR6P33/NwDnr+sn8jQ11mh7vGOUP5rk/cBLgd/r2w7obz/q/55BF+gzgQuq6pcASb4zaltn9//dBXglcHE/8H82sKJfdi2wIMn5PPF+jfW+D273t4EvAlTVjUlup5suBbikqh7o67kB2J7+w2tdcwpl8q4HVrv+QpLnArOBW8dbqaqWVdWpdOHtZQfWvS8C/7OqXgUcCWwKUFV/AbwP+A/AVUl2raor6ML5LuCsJIcxdkDpCXfS/RsYmY/eAvhZkpNHpioG+p5aVbsABwNnJtmU7vX986qa0992qqozmPh1//f+vwGuH1j/VVV1QL/szcBf0X17Wpxko7He91HbXdPjPjxw/zHW40DYAJ+8S+i+Uh8G0H/F/jzwt3Rf91aTZEaSfQaa5tCduap1awu6QAY4fKQxycv6D9fPAouAXZNsD9xbVV+hu4zEa4Crgb2TbNW/54fQfWVX59s88bq+E/in6pw4EqqjV+ivq7SoX+8i4L39dAZJtk3yQrqpr7f0c9Iz6MJ4LDcBs5L8Vr/+xklekeRZwOyqupRu2mxLYMZY7/uo7V0B/GG/rZ2B7XgaXknVKZRJ6udQ/wD4UpJP0H0IXkh3eYHfott5c+fAKocAxyf5G+CXdCOGI9Zt1RuczUa9B/OBk4Bzk9wFXAXs2C87ut9B9RhwA/Bd4N3AcUkeAR4CDquqFemuunkp3ejswqq6YJ08m6efsV7fL9F9W/kX4Gd0r+EwPk13Mbzf6G8/6KcyHgLeU1U/7PchXUM38FlEd/nq1VTVr5O8E/jLJFvQZdsXgJuBb/RtoRv935/uF8VGv+/bDGzyS8CXkyyj+1GbI/o59CGf1rrhqfSSntaSzKiqh5JsRjcynjfyIzMbOkfgkp7uTk/3M46bAl83vJ/gCFySGuVOTElqlAEuSY0ywCWpUQa4JDXKAJekRv0/NoxmBBvxmB8AAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Load data and split into train (80%) and test (20%)\n",
    "data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n",
    "raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n",
    "X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n",
    "y = raw_df.values[1::2, 2]\n",
    "Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.20,\n",
    "                                                random_state=10101)\n",
    "\n",
    "# OLS regression using all features\n",
    "lr = linear_model.LinearRegression()\n",
    "lr.fit(Xtrain, ytrain)\n",
    "# Lasso with cross-validated penalization (lambda)\n",
    "lasso = linear_model.LassoCV(cv=5)\n",
    "lasso.fit(Xtrain, ytrain)\n",
    "# L0-regression where the best feature subset is selected via cross-validation\n",
    "intercept, beta = L0_regression(Xtrain, ytrain, seed=10101)\n",
    "\n",
    "# Compare their performance using a bar chart\n",
    "performance = []\n",
    "performance.append(mse(ytest, lr.predict(Xtest)))\n",
    "performance.append(mse(ytest, lasso.predict(Xtest)))\n",
    "performance.append(mse(ytest, np.dot(Xtest, beta) + intercept))\n",
    "plot_bar_chart(performance)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OpothO6VSEzD"
   },
   "source": [
    "Notice that the Lasso performs poorly, as we did not standardize the features to be expressed in the same units (with an average of zero and variance of one). Contrary to OLS and L0-Regression, the Lasso is not scale-invariant because the budget constraint is based on the L1-norm. Remember that $\\beta_l$ is interpreted as the change in the response per unit-change of feature $l$. Since the L1-norm takes the sum of absolute values, how much of the budget $\\beta_l$ consumes depends on the units of measurement of the feature associated to it. \n",
    "\n",
    "Such preprocessing entails three steps, namely:\n",
    "\n",
    "For each feature $x_l$:\n",
    "1. Compute its sample average $\\mu_l$ and sample standard deviation $\\sigma_l$.\n",
    "2. Center by subtracting $\\mu_l$ from $x_l$.\n",
    "3. Scale by dividing the resulting difference by $\\sigma_l$.\n",
    "\n",
    "In order to report the performance of the Lasso after applying standardization, we need to perform hyper-parameter tuning on the L1-norm penalty via cross-validation. Unfortunately, we must not use the model class `LassoCV`. This is due to the fact that standardization is not supported, and doing that beforehand over the whole dataset would contaminate the folds. In order to prevent that from happening, we will perform random search as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "id": "Fsnm7IplSEzD"
   },
   "outputs": [],
   "source": [
    "np.random.seed(10101)\n",
    "num_tries = 500\n",
    "best_alpha = None\n",
    "best_score = -np.inf\n",
    "for i in range(num_tries):\n",
    "    # log-linear search for alpha in the domain [0.001, 1000]\n",
    "    exponent = np.random.uniform(-3, 3)\n",
    "    alpha = np.power(10, exponent)\n",
    "    pipeline = make_pipeline(StandardScaler(), linear_model.Lasso(alpha=alpha))\n",
    "    scores = cross_val_score(pipeline, Xtrain, ytrain, cv=5, scoring='neg_mean_squared_error')\n",
    "    avg_score = np.mean(scores)\n",
    "    if avg_score > best_score:\n",
    "        best_score = avg_score\n",
    "        best_alpha = alpha"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-uDTjWOuSEzE"
   },
   "source": [
    "Let's now compare the performance of the models considered when the features are preprocessed. Notice that our user-defined function `L0-regression` does support standardization of the features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 452
    },
    "id": "-1jq76jUSEzE",
    "outputId": "1fdb7dc5-ada1-40e9-d6e2-e30d02cb97b5"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAEICAYAAABRSj9aAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAXN0lEQVR4nO3dfbRdVX3u8e8jL1ISlFqjkpAIVuRFRZBci9IqoFKNCl5xqFRBfGmkFxQcKMNCS6lKW7UX9HprLYXaqkFRSNFyqeBF0EFvYRBCJIaAr7wEooDKm1pe5Hf/WOvo5rhPzj7k5ARmvp8xzsjec8259lx7wbPXnmutuVNVSJLa9ZiN3QFJ0oZl0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gV3OS/EmSHyW5J8nvbOz+TCTJ9UlesrH7ofYZ9JpWSQ5PsjLJz5P8MMnfJ9l2Cu3XK/ySbAGcAhxQVbOr6sdD6hyf5Af9B8GaJGc93NfbGJLsm6SSLB1X/py+/JKBsoOSrEhyV5Lbk1yUZId+2UlJ7u/fh7G/O2Z0YzQjDHpNmyTHAh8C3gs8HtgbeCrw1SRbzlA3ngxsBayaoI9vBg4FXlJVs4GFwEUz1LfpdBvwgnHfWN4MfHvsSZKnA58GjqXbHzsCnwAeHGhzVv+BOPa37QbvuWacQa9pkeRxwF8C76yqr1TV/VV1PfA6urB/U1/vn5N8cKDdvknW9I8/AywA/q0/ujxugtd6bJKPJrml//toX/YM4Lq+2h1Jvjak+X8DLqiq7wFU1Q+r6rSBdb8lyeokdyf5fpJ3jO9rkuOS3JpkbZJXJ1mU5NtJfpLk+IH6JyU5O8lZ/fqWJ3nOBNv0mCTvS/K9JD9O8oUkT1jHW34fcC7whr79Zv17vWSgzh7AD6rqourcXVXnVNWN61ivGmTQa7q8gO5I+iHDCVV1D/DvwEsnW0FVHQrcCLyqP7r88ARVT6D7trAH8BzgecCfVdW3gWf2dbatqv2HtL0MOCzJe5Ms7ANy0K3AK4HHAW8BTk3y3IHlT+m3cx5wIvCPdB9iewF/AJyY5GkD9Q8Cvgg8ATgTOLcfXhrvXcCrgRcBc4GfAn83wfaP+TRwWP/4D+m+xdwysHw5sEuSU5Psl2T2JOtTowx6TZcnArdX1QNDlq3tl0+XNwLvr6pbq+o2um8Sh47SsKo+C7yTLhi/Dtya5H0Dy/9PVX2vPwL+OnAhXYCPuR84uaruBz5Pt10f64+WV9GF7e4D9a+sqrP7+qfQfUjsPaRr7wBOqKo1VXUvcBLw2iSbr2Nb/h/whCQ70wX+p8ct/z6wL92H0heA2/tvVIOB/7okdwz8XTzR6+nRy6DXdLkdeOIEwbRdv3zKknxy4ETh2LDIXOCGgWo39GXj2y4YPNE4Vl5VS6rqJcC2wBHA+5P8Yd/m5Uku64dh7gAW8dAPqR9X1S/7x7/o//3RwPJfAINBetPA6z4IrBnWV7rhrX8dC1xgNfBLunMO6/IZ4ChgP+Bfxy+sqsuq6nVVNYfuA+uFdN+IxnyhqrYd+NtvktfTo5BBr+nyn8C9wGsGC5PMAl7Or094/gzYeqDKU8at5yHTqVbVEQMnCv+qL76FLhjHLOChQxZjbW8cPNE4ZPn9VfVF4GrgWUkeC5wD/C3w5P7E5PlAJt7sSc0fe5DkMcD2w/pK94Hw8nGhu1VV3TzJ+j8D/A/g/Kr6+boqVtUVdENrz5rSFuhRz6DXtKiqO+mGUD6e5GVJtugv4/si3VHsZ/qqK4BFSZ6Q5CnAMeNW9SPgaazb54A/SzInyRPpxso/O0o/+8s/X5Fkm/4E6MvpxvUvB7YEHkt3RcsD/bIDRlnvOuyV5DX9N51j6D4MLxtS75PAyUme2vdzTpKDJlt5Vf2Ablz/hPHLkvx+kj9O8qT++S7AgRO8vhpm0Gva9CdPj6c7Ir6LLjxvAl7cjztDF/jfBK6nG/8efw37X9OF+B1J3jPBS30QWEZ3JL6S7qTjByeoO95dfR9vBO4APgz8SVVdWlV3050U/QLdydA/Ar484non8iXg9f36DgVe04/Xj/ex/rUuTHI3XRj/3igv0Pd92LeEO+iCfWU/dPUVuuGdwZPcrx93Hf09Yx8Makf84RFpw0hyEvD0qnrTxu6LNm0e0UtS4yYN+iTzk1zc30SyKsnR45a/J91t10Mvn+vHa69L8t3By9gkSTNj0qGbJNsB21XV8iTbAFcCr66qa5LMB04HdgH2qqrbx7XdjO6W7JfSnZC7Ajikqq6Z/k2RJA0z6RF9Va2tquX947vpru+d1y8+FTiOcZfEDXge8N2q+n5V3Ud3g8mkVxJIkqbPhHfdDdNfLrcncHmSA4Gbq+qbyYSXGc9j4IYRuqP6oVcSJFkMLAaYNWvWXrvssstUuiZJm7Qrr7zy9v7GuN8wctD3t02fQ3ct8AN01+1Odo3xsE+AoUf//cRSpwEsXLiwli1bNmrXJGmTl+SGiZaNdNVNPwnTOcCSqloK/C7dlKffTHI93d1+y/sbYAatYeDOQCa+K1CStIFMekSfblzmDGB1VZ0CUFUrgScN1LkeWDj+ZCzdydedkuwI3Ew3peofTU/XJUmjGOWIfh+6O/r2T/dLNSuSLJqocpK5Sc4H6GcyPAq4gO4k7hf6Gf4kSTNk0iP6qrqUSSZ1qqodBh7fQjfj39jz8+kmhpIkbQTeGStJjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDVu0qBPMj/JxUlWJ1mV5Oi+/ANJrk6yIsmFSeZO0P7dfbtvJflckq2meyMkSRMb5Yj+AeDYqtoV2Bs4MsluwEeqaveq2gM4DzhxfMMk84B3AQur6lnAZsAbpqvzkqTJTRr0VbW2qpb3j+8GVgPzququgWqzgJpgFZsDv5Vkc2Br4Jb167IkaSo2n0rlJDsAewKX989PBg4D7gT2G1+/qm5O8rfAjcAvgAur6sIJ1r0YWAywYMGCqXRLkrQOI5+MTTIbOAc4ZuxovqpOqKr5wBLgqCFtfhs4CNgRmAvMSvKmYeuvqtOqamFVLZwzZ87Ut0SSNNRIQZ9kC7qQX1JVS4dUORM4eEj5S4AfVNVtVXU/sBR4wcPtrCRp6ka56ibAGcDqqjploHyngWoHAtcOaX4jsHeSrfv1vJhujF+SNENGGaPfBzgUWJlkRV92PPC2JDsDDwI3AEcA9JdZnl5Vi6rq8iRnA8vprt65CjhtejdBkrQuqZroYpmNZ+HChbVs2bKN3Q1JetRIcmVVLRy2zDtjJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDVu0qBPMj/JxUlWJ1mV5Oi+/ANJrk6yIsmFSeZO0H7bJGcnubZfx/OneyMkSRMb5Yj+AeDYqtoV2Bs4MsluwEeqaveq2gM4DzhxgvYfA75SVbsAzwFWr3+3JUmj2nyyClW1FljbP747yWpgXlVdM1BtFlDj2yZ5HPBC4PC+/X3AfevfbUnSqKY0Rp9kB2BP4PL++clJbgLeyPAj+qcBtwGfSnJVktOTzJpg3YuTLEuy7LbbbptKtyRJ6zBy0CeZDZwDHFNVdwFU1QlVNR9YAhw1pNnmwHOBv6+qPYGfAe8btv6qOq2qFlbVwjlz5kxxMyRJExkp6JNsQRfyS6pq6ZAqZwIHDylfA6ypqsv752fTBb8kaYaMctVNgDOA1VV1ykD5TgPVDgSuHd+2qn4I3JRk577oxcA14+tJkjacSU/GAvsAhwIrk6zoy44H3tYH+IPADcARAP1llqdX1aK+7juBJUm2BL4PvGX6ui9JmswoV91cCmTIovMnqH8LsGjg+Qpg4cPsnyRpPXlnrCQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1LhJgz7J/CQXJ1mdZFWSo/vyDyS5OsmKJBcmmbuOdWyW5Kok501n5yVJkxvliP4B4Niq2hXYGzgyyW7AR6pq96raAzgPOHEd6zgaWL2+nZUkTd2kQV9Va6tqef/4brrAnldVdw1UmwXUsPZJtgdeAZy+/t2VJE3V5lOpnGQHYE/g8v75ycBhwJ3AfhM0+yhwHLDNJOteDCwGWLBgwVS6JUlah5FPxiaZDZwDHDN2NF9VJ1TVfGAJcNSQNq8Ebq2qKydbf1WdVlULq2rhnDlzRt4ASdK6jRT0SbagC/klVbV0SJUzgYOHlO8DHJjkeuDzwP5JPvsw+ypJehhGueomwBnA6qo6ZaB8p4FqBwLXjm9bVX9aVdtX1Q7AG4CvVdWb1rvXkqSRjTJGvw9wKLAyyYq+7HjgbUl2Bh4EbgCOAOgvszy9qhZNf3clSVM1adBX1aVAhiw6f4L6twC/EfJVdQlwydS69zBkWFc1LWrohVWSHuG8M1aSGmfQS1LjpnQdvTTd8pcOtW0o9RcOtanjEb0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMb5U4KSpuSSS/z5xw1l3303zM8/ekQvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMZNGvRJ5ie5OMnqJKuSHN2XfyDJ1UlWJLkwydxR20qSZs4oR/QPAMdW1a7A3sCRSXYDPlJVu1fVHsB5wIlTaCtJmiGTBn1Vra2q5f3ju4HVwLyqumug2izgN+bXnKjtdHRckjSaKc1Hn2QHYE/g8v75ycBhwJ3AflNpO2T5YmAxwIIFC6bSLUnSOox8MjbJbOAc4Jixo/mqOqGq5gNLgKOm0na8qjqtqhZW1cI5c+ZMZRskSeswUtAn2YIuqJdU1dIhVc4EDn6YbSVJG9AoV90EOANYXVWnDJTvNFDtQODaUdtKkmbOKEf0+wCHAvv3l1KuSLII+Jsk30pyNXAAMHbZ5dwk50/SVpI0QyY9GVtVlwLDfg34/CFlVNUtwKJJ2kqSZoh3xkpS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY2bNOiTzE9ycZLVSVYlObov/0CSq5OsSHJhkrkTtH9ZkuuSfDfJ+6Z7AyRJ6zbKEf0DwLFVtSuwN3Bkkt2Aj1TV7lW1B3AecOL4hkk2A/4OeDmwG3BI31aSNEMmDfqqWltVy/vHdwOrgXlVdddAtVlADWn+POC7VfX9qroP+Dxw0Pp3W5I0qs2nUjnJDsCewOX985OBw4A7gf2GNJkH3DTwfA3wexOsezGwuH96T5LrptK3R6knArdv7E6MLNnYPXgkeNTss5zk/uo9avYZrNc+e+pEC0YO+iSzgXOAY8aO5qvqBOCEJH8KHAX8xfhmQ1Y17MifqjoNOG3U/rQgybKqWrix+6HRuc8efdxnI151k2QLupBfUlVLh1Q5Ezh4SPkaYP7A8+2BW6baSUnSwzfKVTcBzgBWV9UpA+U7DVQ7ELh2SPMrgJ2S7JhkS+ANwJfXr8uSpKkYZehmH+BQYGWSFX3Z8cDbkuwMPAjcABwB0F9meXpVLaqqB5IcBVwAbAb8U1WtmuZteDTbpIaqGuE+e/TZ5PdZqoYOmUuSGuGdsZLUOINekhpn0G9ASbZP8qUk30nyvSQfS7Jlkn2TnDek/iuTXJXkm0muSfKOjdHvTUWSezZ2H1o27P1N8tgkZ/VTolze35szrO1JSW7up1i5JskhG7zDI0oyN8nZG7sfU2HQbyD91UpLgXOraifgGcBs4OQJ6m9Bd9LoVVX1HLob0y6Zmd5KM+ZtwE+r6unAqcCH1lH31H6KlYOAf+j/H1kvSaZ0k+gwVXVLVb12fdczkwz6DWd/4L+q6lMAVfVL4N3AW4Gth9Tfhu4qqB/39e+tqk3h7uBHlCSv6o80r0ryf5M8uS9/UX90uaJftk2S7ZJ8oy/7VpI/6OsekmRlX7auINsUHQT8S//4bODF/UHRhKrqO8DPgd8GSPLeJFf0kyr+5Vi9JH+e5NokX03yuSTv6csvSfJXSb4OHJ1kryRfT3JlkguSbNfXe1f/7eHqJJ/vy4bt9x2SfKtfvlWST/X7+6ok+/XlhydZmuQr/Tf6D0/nmzhV6/3ppgk9E7hysKCq7kpyI/D08ZWr6idJvgzckOQiuoniPldVD85IbzXmUmDvqqokbweOA44F3gMcWVX/0d8l/l90U3ZcUFUn9xP4bd1fXvwhYC/gp8CFSV5dVedujI15BPrVtCj95dd3Ar/DOqYoSPJc4DtVdWuSA4Cd6ObRCvDlJC+k+yA4mO6b8ObAch76/9+2VfWi/lvB14GDquq2JK+n+5b9VuB9wI5VdW+Sbft2w/b7oCP7bXl2kl3o9vcz+mV79P25F7guycer6iY2AoN+wwnDp3uYqJyqenuSZwMvofsP7KXA4Ruqgxpqe+Cs/ihvS+AHffl/AKckWQIsrao1Sa4A/qkPj3OrakWS/YFLquo2gL7+C4FzZ3pDHqFGnhYFeHeSPwaeBrysLzug/7uqfz6bLvi3Ab5UVb8ASPJv49Z1Vv/vzsCzgK/2XyQ2A9b2y64GliQ5l1/vr2H7fXC9vw98HKCqrk1yA90wLcBFVXVn359r6Oai2ShB79DNhrMKeMj8GkkeRzclxPcmalRVK6vqVLqQHzathDasjwP/u6qeDbwD2Aqgqv4GeDvwW8BlSXapqm/QhfjNwGeSHMZ6zkq1CfjVtCj9ePnjgZ8kOXlsiGSg7qlVtTPweuDTSbaie3//uqr26P+eXlVnMPn7/rP+3wCrBto/u6oO6Je9gm5a9b2AK5NsPmy/j1vvul733oHHv2QjHlgb9BvORXRf5Q+DX83N/z+Bf6b7mvkQSWYn2XegaA+6O441sx5PF9wAbx4rTPK7/Yfwh4BlwC5JngrcWlX/SDdNyHPpZnZ9UZIn9vv8ELqhAnW+zK/f19cCX6vOCWPhO75BP7/Wsr7dBcBb+2EUksxL8iS6IbdX9WPms+lCe5jrgDlJnt+33yLJM5M8BphfVRfTDddtC8wett/Hre8bwBv7dT0DWNC/xiOKQzcbSD/G+9+BTyT5c7oP1fPppo94Pt1JqDUDTQ4BjkvyD8Av6I5ADp/ZXm9yth63D04BTgK+mORm4DJgx37ZMf2Jtl8C1wD/Tjd303uT3A/cAxxWVWvTzeZ6Md3R3vlV9aUZ2ZpHnmHv7yfovv18F/gJ3Xs4ivfTTZ64a//3n/0Qyj3Am6rqiv4c1zfpDpCW0U2f/hBVdV+S1wL/K8nj6TLwo8C3gc/2ZaH7NnFHul/SG7/ftxtY5SeATyZZSfcjTYf3Y/wjbtbMcAoESU1IMruq7kmyNd2R9uKxH03a1HlEL6kVp6X7qdKtgH8x5H/NI3pJapwnYyWpcQa9JDXOoJekxhn0ktQ4g16SGvf/AdhPD8iSKe37AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Standardize the features so they have an avg of 0 and a sample var of 1\n",
    "scaler = StandardScaler()\n",
    "scaler.fit(Xtrain)\n",
    "Xtrain_std = scaler.transform(Xtrain)\n",
    "Xtest_std = scaler.transform(Xtest)\n",
    "\n",
    "# OLS regression using all features\n",
    "lr = linear_model.LinearRegression()\n",
    "lr.fit(Xtrain_std, ytrain)\n",
    "\n",
    "# Lasso with cross-validated penalization (lambda)\n",
    "lasso = linear_model.Lasso(alpha=best_alpha)\n",
    "lasso.fit(Xtrain_std, ytrain)\n",
    "# L0-regression where the best feature subset is selected via cross-validation\n",
    "intercept, beta = L0_regression(Xtrain, ytrain, standardize=True, seed=10101)\n",
    "\n",
    "# Compare their performance using a Bar chart\n",
    "performance = []\n",
    "performance.append(mse(ytest, lr.predict(Xtest_std)))\n",
    "performance.append(mse(ytest, lasso.predict(Xtest_std)))\n",
    "performance.append(mse(ytest, np.dot(Xtest_std, beta) + intercept))\n",
    "plot_bar_chart(performance)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "U8xiVKMKSEzE"
   },
   "source": [
    "As expected, the Lasso is better (although marginally) than OLS. This is due to the fact that the Lasso can retrieve the $\\beta_{OLS}$ estimate when the budget $s$ is big enough (alternatively, when $\\lambda$ is small enough). On the other hand, it is marginally worse than L0-Regression, mainly because by shrinking $\\beta$ we add bias to the estimates. Furthermore, observe that L0-Regression achieved the best performance with the fewest number of features. This is convenient, as it leads to a more interpretable model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "gqiC7MsBSEzF",
    "outputId": "fb68f4d5-112f-4c77-e3bc-5f97b59030c5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OLS regression kept 13 features.\n",
      "The Lasso kept 12 features.\n",
      "L0-Regression kept 11 features.\n"
     ]
    }
   ],
   "source": [
    "ols_features = np.sum(np.abs(lr.coef_) >= 1e-8)\n",
    "lasso_features = np.sum(np.abs(lasso.coef_) >= 1e-8)\n",
    "l0_features = np.sum(np.abs(beta) >= 1e-8)\n",
    "print(\"OLS regression kept {0} features.\".format(ols_features))\n",
    "print(\"The Lasso kept {0} features.\".format(lasso_features))\n",
    "print(\"L0-Regression kept {0} features.\".format(l0_features))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Vk6O3k-PSEzF"
   },
   "source": [
    "### Final Model\n",
    "\n",
    "The previous analysis indicates that the best candidate is the model suggested by L0-Regression. The resulting equation is as follows:\n",
    "\n",
    "\\begin{equation}\n",
    "\\text{medv} = 22.56-1.02\\text{crim}+1.46\\text{zn}+0.49\\text{chas}-1.93\\text{nox}+2.53\\text{rm}\n",
    "\\end{equation}\n",
    "\n",
    "\\begin{equation}\n",
    "-3.48\\text{dis}+2.65\\text{rad}-2.22\\text{tax}-1.87\\text{ptratio}+1.00\\text{b}-3.69\\text{lstat}\n",
    "\\end{equation}\n",
    "\n",
    "**Note:** The mean and variance vectors used in the standardization step can be accessed through `scaler.mean_` and `scaler.var_`, respectively. \n",
    "\n",
    "Since we standardized the data, the intercept represents the estimated median value (in thousands) of a house with mean values across features. Likewise, we can interpret $\\beta_1=-1.02$ as the decrease in the house value when the per-capita crime rate increases by one standard deviation from the average value, all other things being equal (similar statements can be made for the rest of the features). Finally, if the main purpose of the analysis is to explain the variability in the response, having 11 features may be too much. However, remember that one can always set the number of active features to a more manageable number to ease the interpretation, perhaps at the expense of predictive power.\n",
    "\n",
    "---\n",
    "## Conclusions\n",
    "\n",
    "It has been shown how mathematical programming can be used to perform feature selection on linear regression problems. It is in fact a good alternative to the Lasso, given that L0-Regression is scale invariant and does not add bias to the weight estimates. Furthermore, this approach is amenable to the specification of additional linear constraints (Bertsimas, 2015), such as:\n",
    "\n",
    "- Enforcing group sparsity among features.\n",
    "- Limiting pairwise multicollinearity.\n",
    "- Limiting global multicollinearity.\n",
    "- Considering a fixed set of nonlinear transformations.\n",
    "\n",
    "Nevertheless, take this result with caution, as \"there is no free lunch in statistics\". That is, no algorithm outperforms all others under all possible datasets. Ultimately, a good data scientist should consider multiple learning algorithms when analyzing a dataset.\n",
    "\n",
    "---\n",
    "## References\n",
    "\n",
    "1. Bertsimas, D., & King, A. (2015). OR forum—An algorithmic approach to linear regression. Operations Research, 64(1), 2-16.\n",
    "2. Bertsimas, D., King, A., & Mazumder, R. (2016). Best subset selection via a modern optimization lens. The annals of statistics, 44(2), 813-852.\n",
    "3. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: springer.\n",
    "4. The Boston housing dataset (1996, October 10). Retrieved from https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "NHabIFYMSEzF"
   },
   "source": [
    "Copyright © 2020 Gurobi Optimization, LLC"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
